Multiple-objective portfolio optimization¶
INTRODUCTION¶
Your task is to solve a multiple-objective portfolio optimization problem.
- Use the basic Markowitz's model from 1952 (see Lecture 1)
- Solve = construct Pareto front approximations.
- The dataset is the same as for the portfolio game part 1 (bundle1.zip).
- The dataset consists of the historical prices of 20 assets.
- The bundle contains 20 files (*.txt) linked to different assets.
- The name of the file suggests the asset's name.
- The structure of every file is as follows:
- The first line contains the name of the asset.
- The second line provides the number of data points N.
- The following N lines are data points with the structure: time, price.
- The historical timeline for all assets is time $\in$ [0,100].
- Future predictions should be calculated for time = 200.
Goal:
- Load data, make predictions, and build the model.
- Illustrate your predictions (can be done in the jupyter notebook)
- Then, implement the WSM and ECM methods (see the tutorial on quadratic programming provided below).
- Run your implementations for different calculation limits (e.g., the number of weight vectors for WSM). Compare the methods' efficiency in finding unique Pareto optimal solutions. Finally, illustrate generated Pareto fronts.
Short tutorial on the cvxopt library for quadratic programming¶
import numpy as np
from cvxopt import matrix, solvers
QP Optimization Problem¶
General model:¶
$max$ $\boldsymbol{cx} - \dfrac{1}{2}\boldsymbol{x}^T\boldsymbol{Qx}$
$s.t.$
$\boldsymbol{Gx} \leq \boldsymbol{h}$
$\boldsymbol{x} \geq \boldsymbol{0}$
But the library uses the following form:¶
$min$ $\boldsymbol{cx} + \dfrac{1}{2}\boldsymbol{x}^T\boldsymbol{Qx}$
$s.t.$
$\boldsymbol{Gx} \leq \boldsymbol{h}$
$\boldsymbol{Ax} = \boldsymbol{b}$
Exmple¶
$min$ $2x^2_1+x_2^2+x_1x_2+x_1+x_2$
$s.t.$
$x_1 \geq 0$
$x_2 \geq 0$
$x_1 + x_2 = 1$
Hence:¶
Q = matrix([ [4.0, 1.0], [1.0, 2.0] ]) ## [4, 1] is 1st column, not row!
c = matrix([1.0, 1.0]) ### (1, 2) = dimensions (1 row and 2 columns)
A = matrix([1.0, 1.0], (1,2)) ### (1, 2) = dimensions (1 row and 2 columns)
b = matrix(1.0)
G = matrix([[-1.0,0.0],[0.0,-1.0]]) ### multiplied both sides by -1
h = matrix([0.0,0.0]) ### multiplied both sides by -1
solQP=solvers.qp(Q, c, G, h, A, b)
pcost dcost gap pres dres 0: 1.8889e+00 7.7778e-01 1e+00 3e-16 2e+00 1: 1.8769e+00 1.8320e+00 4e-02 2e-16 6e-02 2: 1.8750e+00 1.8739e+00 1e-03 1e-16 5e-04 3: 1.8750e+00 1.8750e+00 1e-05 1e-16 5e-06 4: 1.8750e+00 1.8750e+00 1e-07 3e-16 5e-08 Optimal solution found.
print(solQP.keys())
dict_keys(['x', 'y', 's', 'z', 'status', 'gap', 'relative gap', 'primal objective', 'dual objective', 'primal infeasibility', 'dual infeasibility', 'primal slack', 'dual slack', 'iterations'])
print(solQP['x'])
print(solQP['primal objective'])
[ 2.50e-01] [ 7.50e-01] 1.8750000000000182
We can also solve LP problems:¶
$min$ $\boldsymbol{c}\boldsymbol{x}$
$s.t.$
$\boldsymbol{Gx} \leq \boldsymbol{h}$
$\boldsymbol{Ax} = \boldsymbol{b}$ (optional)
Exmple¶
$min$ $2x_1+x_2$
$s.t.$
$-x_1 +x_2 \leq 1$
$x_1 + x_2 \geq 2$
$x_2 \geq 0$
$x_1 - 2x_2 \leq 4$
G = matrix([ [-1.0, -1.0, 0.0, 1.0], [1.0, -1.0, -1.0, -2.0] ])
h = matrix([ 1.0, -2.0, 0.0, 4.0 ])
c = matrix([ 2.0, 1.0 ])
solLP = solvers.lp(c,G,h)
###!!!! OPTIONALLY A and b can be provided (equality constraints) as in solQP=solvers.qp(Q, c, G, h, A, b)
pcost dcost gap pres dres k/t 0: 2.6471e+00 -7.0588e-01 2e+01 8e-01 2e+00 1e+00 1: 3.0726e+00 2.8437e+00 1e+00 1e-01 2e-01 3e-01 2: 2.4891e+00 2.4808e+00 1e-01 1e-02 2e-02 5e-02 3: 2.4999e+00 2.4998e+00 1e-03 1e-04 2e-04 5e-04 4: 2.5000e+00 2.5000e+00 1e-05 1e-06 2e-06 5e-06 5: 2.5000e+00 2.5000e+00 1e-07 1e-08 2e-08 5e-08 Optimal solution found.
print(solLP.keys())
dict_keys(['x', 'y', 's', 'z', 'status', 'gap', 'relative gap', 'primal objective', 'dual objective', 'primal infeasibility', 'dual infeasibility', 'primal slack', 'dual slack', 'residual as primal infeasibility certificate', 'residual as dual infeasibility certificate', 'iterations'])
print(solLP['x'])
print(solLP['primal objective'])
[ 5.00e-01] [ 1.50e+00] 2.499999989554308
Portfolio optimization¶
Import the necessary libraries (numpy, pandas, matplotlib, cvxopt).
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from cvxopt import matrix, solvers
from sklearn.linear_model import Lasso
from scipy.signal import savgol_filter
# For reproducible randomness (if needed)
np.random.seed(42)
Load the data from *Part1.txt files. Each file’s first line is the asset name, second line is N, and then N lines with “time price”.
def load_asset_data(data_folder="data"):
asset_names = []
asset_times = []
asset_prices = []
txt_files = [f for f in os.listdir(data_folder) if f.endswith("Part1.txt")]
for fname in txt_files:
path = os.path.join(data_folder, fname)
with open(path, "r") as f:
# 1) asset name
asset_name = f.readline().strip()
# 2) number of data points
N_line = f.readline().strip()
N = int(N_line)
# 3) read time, price lines
times = []
prices = []
for _ in range(N):
line = f.readline().strip()
t_str, p_str = line.split()
times.append(float(t_str))
prices.append(float(p_str))
asset_names.append(asset_name)
asset_times.append(times)
asset_prices.append(prices)
print(f"Found {len(asset_names)} assets.")
print("First few asset names:", asset_names[:5])
return asset_names, asset_times, asset_prices
asset_names, asset_times, asset_prices = load_asset_data()
Found 20 assets. First few asset names: ['SafeAndCare', 'Moneymakers', 'Fuel4', 'CPU-XYZ', 'MarsProject']
def plot_asset_forecasts(asset_names, asset_times, asset_prices, forecast_results, training_offset=100, color="red"):
"""
Plots forecast data for multiple assets on a 5x4 grid.
Parameters:
asset_names (list): List of asset names.
asset_times (list of arrays/lists): Time points for each asset.
asset_prices (list of arrays/lists): Historical prices for each asset.
forecast_results (dict): Dictionary mapping asset names to forecast results for a specific method.
Each entry for an asset should contain:
- "price_at_training": forecast price at training time.
- "price_at_forecast": forecast price at forecast time.
- "full_reconstruction": (optional) tuple (recon_times, reconstruction) for the full reconstruction curve.
- "forecast_time": (optional) forecast time (defaults to 200 if not provided).
training_offset (int): The offset between forecast time and training time (default 100).
color (str): Color for all forecast markers and lines.
The function creates a grid of subplots (max 20 assets), where for each asset:
- Historical data is plotted as black scatter points.
- A training marker is plotted at (forecast_time - training_offset) using a circle marker.
- A forecast marker is plotted at forecast_time using an 'x' marker.
- If available, a reconstruction curve is plotted.
"""
# Set up a 5x4 grid (max 20 assets)
fig, axes = plt.subplots(5, 4, figsize=(20, 25))
axes = axes.flatten()
num_assets = min(len(asset_names), 20)
for i in range(num_assets):
ax = axes[i]
asset = asset_names[i]
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Plot historical data
ax.scatter(times, prices, label="Historical Prices", color="black")
# Retrieve forecast result for this asset
forecast = forecast_results[asset]
forecast_time = forecast.get("forecast_time", 200)
training_time = forecast_time - training_offset
# Plot training marker
ax.scatter([training_time], [forecast["price_at_training"]],
marker="o", color=color, s=100,
label=f"Training @ t={training_time}")
# Plot forecast marker
ax.scatter([forecast_time], [forecast["price_at_forecast"]],
marker="x", color=color, s=100,
label=f"Forecast @ t={forecast_time}")
# Plot reconstruction curve if available
if "full_reconstruction" in forecast:
recon_times, reconstruction = forecast["full_reconstruction"]
ax.plot(recon_times, reconstruction, label="Reconstruction",
color=color, linewidth=2)
ax.set_title(asset)
ax.set_xlabel("Time")
ax.set_ylabel("Price")
ax.legend(fontsize='small')
# Remove any unused subplots if there are fewer than 20 assets.
for j in range(num_assets, len(axes)):
fig.delaxes(axes[j])
plt.tight_layout()
plt.show()
all_results = {}
Perform a simple linear regression (degree=1) for each asset using times in [0,100]. Extrapolate to time=200 to get a predicted price. Convert that predicted price growth into a predicted return \mu[i].
import numpy as np
from sklearn.linear_model import Lasso
from scipy.signal import savgol_filter
def baseline_forecast(asset_names, asset_times, asset_prices, training_start=0, training_end=100, forecast_time=200):
"""
Returns a dictionary with, for each asset:
- "model_predictions": (training_times, model_predictions) on [training_start, training_end]
- "full_reconstruction": (recon_times, linear_reconstruction) on [training_start, forecast_time]
- "price_at_training": predicted price at training_end
- "price_at_forecast": predicted price at forecast_time
- "predicted_return": computed return
"""
baseline_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Use training data: times between training_start and training_end
mask = (times >= training_start) & (times <= training_end)
t_used = times[mask]
p_used = prices[mask]
coeffs = np.polyfit(t_used, p_used, deg=1)
# Model predictions on training interval
model_predictions = np.polyval(coeffs, t_used)
price_at_training = np.polyval(coeffs, training_end)
price_at_forecast = np.polyval(coeffs, forecast_time)
predicted_return = (price_at_forecast - price_at_training) / price_at_training
# Generate a new time vector for full reconstruction:
recon_times = np.linspace(training_start, forecast_time, forecast_time - training_start)
linear_reconstruction = np.polyval(coeffs, recon_times)
baseline_pred[name] = {
"model_predictions": (t_used, model_predictions),
"full_reconstruction": (recon_times, linear_reconstruction),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return baseline_pred
linear_results = baseline_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200)
plot_asset_forecasts(asset_names, asset_times, asset_prices, linear_results, training_offset=100, color="red")
import numpy as np
from scipy.signal import savgol_filter
from sklearn.linear_model import Lasso
def build_candidate_library(x, candidate_frequencies):
"""
Build a design matrix with a constant, linear, quadratic term,
and for each candidate frequency, sine and cosine functions.
"""
features = []
feature_names = []
features.append(np.ones_like(x))
feature_names.append('1')
features.append(x)
feature_names.append('x')
features.append(x**2)
feature_names.append('x^2')
for w in candidate_frequencies:
features.append(np.sin(2 * np.pi * w * x))
feature_names.append(f'sin(2π*{w:.2f}x)')
features.append(np.cos(2 * np.pi * w * x))
feature_names.append(f'cos(2π*{w:.2f}x)')
X = np.column_stack(features)
return X, feature_names
def sparse_forecast(asset_names, asset_times, asset_prices, training_start=0, training_end=100, forecast_time=200, alpha=0.01):
"""
For each asset, perform:
1. Denoise the full signal.
2. From training data ([training_start, training_end]), perform FFT on the denoised data
to select candidate frequencies.
3. Compute the linear trend using raw training data.
4. Subtract the trend from the denoised training data.
5. Build a candidate library over the full time series.
6. Fit a sparse (LASSO) model on the detrended training data.
7. Reconstruct the full signal (by adding back the extrapolated trend).
8. Evaluate the learned function at any new x using the candidate library.
"""
sparse_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# 1. Denoise the full signal.
prices_denoised = savgol_filter(prices, window_length=11, polyorder=2)
# 2. Restrict to training data and perform FFT on the denoised training data.
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train_denoised = prices_denoised[mask]
N = len(t_train)
T = t_train[1] - t_train[0] if N > 1 else 1
fft_vals = np.fft.fft(p_train_denoised)
freq = np.fft.fftfreq(N, T)
pos_mask = freq > 0
freq_pos = freq[pos_mask]
fft_magnitude = np.abs(fft_vals)[pos_mask]
threshold = np.mean(fft_magnitude) + 1 * np.std(fft_magnitude)
candidate_frequencies = freq_pos[fft_magnitude > threshold]
candidate_frequencies = np.unique(np.round(candidate_frequencies, 2))
# 3. Compute trend using raw training data (to preserve the true slope).
p_train_raw = prices[mask]
coeffs_trend = np.polyfit(t_train, p_train_raw, 1)
trend_train = np.polyval(coeffs_trend, t_train)
# 4. Subtract the trend from the denoised training data.
p_train_detrended = p_train_denoised - trend_train
# 5. Build candidate library over the full time series.
X, _ = build_candidate_library(times, candidate_frequencies)
X_train = X[mask, :]
# 6. Fit sparse regression (LASSO) on detrended training data.
lasso = Lasso(alpha=alpha, max_iter=10000)
lasso.fit(X_train, p_train_detrended)
coefs = lasso.coef_
intercept_sparse = lasso.intercept_
# 7. Reconstruct the full signal.
p_detrended_reconstructed = intercept_sparse + X.dot(coefs)
trend_full = np.polyval(coeffs_trend, times)
p_reconstructed = p_detrended_reconstructed + trend_full
# Clip negative values in the training reconstruction:
p_reconstructed = np.maximum(p_reconstructed, 0)
# 8. Define a prediction function that evaluates the learned function at any new x.
def predict_new(x_new):
X_new, _ = build_candidate_library(np.array([x_new]), candidate_frequencies)
value = intercept_sparse + X_new.dot(coefs) + np.polyval(coeffs_trend, x_new)
# Return non-negative prediction.
return max(value.item(), 0)
price_at_training = predict_new(training_end)
price_at_forecast = predict_new(forecast_time)
predicted_return = (price_at_forecast - price_at_training) / price_at_training
# Generate a new reconstruction time vector from training_start to forecast_time.
recon_times = np.linspace(training_start, forecast_time, forecast_time - training_start)
X_new, _ = build_candidate_library(recon_times, candidate_frequencies)
p_detrended_new = intercept_sparse + X_new.dot(coefs)
p_reconstructed_new = p_detrended_new + np.polyval(coeffs_trend, recon_times)
# Clip negative values.
p_reconstructed_new = np.maximum(p_reconstructed_new, 0)
sparse_pred[name] = {
"model_predictions": (t_train, np.maximum(p_reconstructed[mask], 0)), # for diagnostic on training data
"full_reconstruction": (recon_times, p_reconstructed_new),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return sparse_pred
# Example usage:
sparse_results = sparse_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200)
plot_asset_forecasts(asset_names, asset_times, asset_prices, sparse_results, training_offset=100, color="red")
Illustrate your predictions by plotting the historical data and the linear fit for a few assets, plus the forecasted price at t=200.
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
def arima_forecast(asset_names, asset_times, asset_prices, training_start=0, training_end=100, forecast_time=200, order=(1,1,1), use_log=True):
"""
For each asset, fit an ARIMA model on training data ([training_start, training_end])
and forecast up to forecast_time.
If use_log is True, a logarithmic transformation is applied to the training data,
and forecasts are exponentiated to ensure all predictions are positive.
Computes expected return as:
(price_at_forecast - price_at_training) / price_at_training.
Returns a dictionary with, for each asset:
- "model_predictions": (training_times, in-sample predictions) on [training_start, training_end]
- "full_reconstruction": (recon_times, arima_predictions) on [training_start, forecast_time]
- "price_at_training": predicted price at training_end (last training prediction)
- "price_at_forecast": predicted price at forecast_time (last forecast value)
- "predicted_return": computed return
"""
arima_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Select training data based on the provided time window.
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train = prices[mask]
if use_log:
# Check that all training prices are positive.
if np.any(p_train <= 0):
raise ValueError(f"Asset {name} has non-positive prices; cannot use log transform.")
p_train_trans = np.log(p_train)
else:
p_train_trans = p_train.copy()
# Fit the ARIMA model on the (possibly transformed) training data.
try:
model = ARIMA(p_train_trans, order=order)
model_fit = model.fit()
except Exception as e:
print(f"ARIMA fit failed for asset {name}: {e}")
continue
# In-sample prediction on the training data.
pred_train_trans = model_fit.predict(start=0, end=len(p_train_trans)-1)
# If using log transform, exponentiate predictions.
if use_log:
pred_train = np.exp(pred_train_trans)
else:
pred_train = pred_train_trans
# Determine forecast steps.
forecast_steps = int(forecast_time - training_end)
if forecast_steps <= 0:
raise ValueError("forecast_time must be greater than training_end")
# Forecast future values.
forecast_trans = model_fit.forecast(steps=forecast_steps)
if use_log:
forecast = np.exp(forecast_trans)
else:
forecast = forecast_trans
# Construct full time series reconstruction.
forecast_times = np.arange(training_end + 1, forecast_time + 1)
full_times = np.concatenate([t_train, forecast_times])
full_predictions = np.concatenate([pred_train, forecast])
# Get the predicted price at training_end and forecast_time.
price_at_training = pred_train[-1]
price_at_forecast = forecast[-1]
predicted_return = (price_at_forecast - price_at_training) / price_at_training
# In case any residual negatives appear (shouldn't happen with log-transform):
full_predictions = np.maximum(full_predictions, 0)
arima_pred[name] = {
"model_predictions": (t_train, pred_train),
"full_reconstruction": (full_times, full_predictions),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return arima_pred
# Example usage:
# (Ensure asset_names, asset_times, asset_prices, and plot_asset_forecasts are defined.)
arima_results = arima_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
order=(2,1,2), use_log=True)
plot_asset_forecasts(asset_names, asset_times, asset_prices, arima_results, training_offset=100, color="red")
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/tsa/statespace/sarimax.py:966: UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/tsa/statespace/sarimax.py:978: UserWarning: Non-invertible starting MA parameters found. Using zeros as starting parameters.
warn('Non-invertible starting MA parameters found.'
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/statsmodels/base/model.py:607: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
import numpy as np
from statsmodels.tsa.holtwinters import ExponentialSmoothing
def exponential_smoothing_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
trend='add', seasonal=None, seasonal_periods=None,
use_multiplicative=False):
"""
Returns:
A dictionary where each key is an asset name and each value is a dictionary containing:
- "model_predictions": (training_times, in-sample predictions) on [training_start, training_end]
- "full_reconstruction": (recon_times, exponential smoothing predictions) on [training_start, forecast_time]
- "price_at_training": predicted price at training_end (last in-sample prediction)
- "price_at_forecast": predicted price at forecast_time (last forecast value)
- "predicted_return": computed return as (price_at_forecast - price_at_training) / price_at_training
"""
exp_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Select training data based on the provided time window.
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train = prices[mask]
# Optionally, use multiplicative components if data are strictly positive.
if use_multiplicative:
if np.any(p_train <= 0):
print(f"Warning: Asset {name} has non-positive values. Multiplicative model not appropriate. Using additive model.")
else:
trend = 'mul'
if seasonal is not None:
seasonal = 'mul'
# Fit the exponential smoothing model.
try:
model = ExponentialSmoothing(p_train, trend=trend, seasonal=seasonal, seasonal_periods=seasonal_periods)
model_fit = model.fit()
except Exception as e:
print(f"Exponential Smoothing model fit failed for asset {name}: {e}")
continue
# In-sample fitted values (for training period).
pred_train = model_fit.fittedvalues
# Clip negative values.
pred_train = np.maximum(pred_train, 0)
# Determine number of forecast steps.
forecast_steps = int(forecast_time - training_end)
if forecast_steps <= 0:
raise ValueError("forecast_time must be greater than training_end")
# Forecast future values.
forecast_values = model_fit.forecast(steps=forecast_steps)
forecast_values = np.maximum(forecast_values, 0)
# Generate a time vector for the forecast period.
forecast_times = np.arange(training_end + 1, forecast_time + 1)
# Build the full reconstruction time series.
full_times = np.concatenate([t_train, forecast_times])
full_predictions = np.concatenate([pred_train, forecast_values])
full_predictions = np.maximum(full_predictions, 0) # ensure non-negative
# Get the predicted prices at training_end and forecast_time.
price_at_training = pred_train[-1]
price_at_forecast = forecast_values[-1]
predicted_return = (price_at_forecast - price_at_training) / price_at_training
exp_pred[name] = {
"model_predictions": (t_train, pred_train),
"full_reconstruction": (full_times, full_predictions),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return exp_pred
exp_results = exponential_smoothing_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
trend='add', seasonal=None, seasonal_periods=None,
use_multiplicative=True)
plot_asset_forecasts(asset_names, asset_times, asset_prices, exp_results, training_offset=100, color="red")
import numpy as np
import pandas as pd
from sktime.forecasting.naive import NaiveForecaster
def naive_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200, strategy="last"):
"""
Returns:
A dictionary where for each asset the following keys are returned:
- "model_predictions": (training_times, in-sample naive predictions) on [training_start, training_end]
- "full_reconstruction": (recon_times, naive forecast predictions) on [training_start, forecast_time]
- "price_at_training": reference price at training_end (from actual training data)
- "price_at_forecast": forecasted price at forecast_time
- "predicted_return": computed return as (price_at_forecast - price_at_training) / price_at_training
"""
naive_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Select training data: times between training_start and training_end.
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train = prices[mask]
# Compute in-sample naive predictions.
if strategy == "last":
in_sample_predictions = np.empty_like(p_train)
in_sample_predictions[0] = p_train[0]
if len(p_train) > 1:
in_sample_predictions[1:] = p_train[:-1]
elif strategy == "drift":
if len(p_train) > 1:
drift = (p_train[-1] - p_train[0]) / (len(p_train) - 1)
else:
drift = 0
in_sample_predictions = np.empty_like(p_train)
in_sample_predictions[0] = p_train[0]
for j in range(1, len(p_train)):
in_sample_predictions[j] = p_train[j-1] + drift
elif strategy == "mean":
mean_value = np.mean(p_train)
in_sample_predictions = np.full_like(p_train, fill_value=mean_value)
else:
raise ValueError("Unsupported strategy. Use 'last', 'drift', or 'mean'.")
# Clip negative in-sample predictions.
in_sample_predictions = np.maximum(in_sample_predictions, 0)
# Determine the number of forecast steps.
forecast_steps = int(forecast_time - training_end)
if forecast_steps <= 0:
raise ValueError("forecast_time must be greater than training_end")
# Prepare the training series for sktime.
# Assuming training times are consecutive integers.
y_train = pd.Series(p_train, index=pd.RangeIndex(start=int(t_train[0]),
stop=int(t_train[0]) + len(t_train)))
# Fit the NaiveForecaster from sktime.
forecaster = NaiveForecaster(strategy=strategy)
forecaster.fit(y_train)
fh = np.arange(1, forecast_steps + 1) # forecasting horizon (steps ahead)
y_forecast = forecaster.predict(fh)
# Clip forecast predictions to ensure non-negativity.
forecast_values = np.maximum(y_forecast.values, 0)
# Create forecast time points (assuming integer time steps).
forecast_times = np.arange(training_end + 1, forecast_time + 1)
# Full reconstruction: combine in-sample predictions and forecast values.
full_times = np.concatenate([t_train, forecast_times])
full_predictions = np.concatenate([in_sample_predictions, forecast_values])
# Determine prices at training_end and forecast_time.
price_at_training = p_train[-1] # using actual last training price as reference
price_at_forecast = forecast_values[-1]
predicted_return = (price_at_forecast - price_at_training) / price_at_training
naive_pred[name] = {
"model_predictions": (t_train, in_sample_predictions),
"full_reconstruction": (full_times, full_predictions),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return naive_pred
# Example usage:
naive_results = naive_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200, strategy="last")
plot_asset_forecasts(asset_names, asset_times, asset_prices, naive_results, training_offset=100, color="green")
import numpy as np
from sklearn.tree import DecisionTreeRegressor
def decision_tree_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
seasonal_period=None, max_depth=None):
"""
For each asset, forecast prices using a Decision Tree Regressor with conditional deseasonalization and detrending.
Steps for each asset:
1. Extract training data from training_start to training_end.
2. Fit a linear trend (using np.polyfit) to the training data.
3. Detrend the training data by subtracting the trend.
4. If seasonal_period is provided and > 1, compute a seasonal component (average residual per seasonal position)
and remove it (deseasonalize the data).
5. Fit a Decision Tree Regressor on the deseasonalized residuals.
6. For forecasting, predict the residuals on new time points, add back the trend (evaluated using the linear model)
and the seasonal component (if available) to obtain the final forecast.
"""
dt_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# 1. Select training data
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train = prices[mask]
# 2. Fit a linear trend for detrending (using a first-degree polynomial)
coeffs = np.polyfit(t_train, p_train, deg=1)
trend_train = np.polyval(coeffs, t_train)
# 3. Detrend the training data
p_detrended = p_train - trend_train
# 4. Conditional deseasonalization if a seasonal period is provided
if seasonal_period is not None and seasonal_period > 1:
# Compute the average seasonal effect for each season position
seasonal_effect = np.zeros(seasonal_period)
count = np.zeros(seasonal_period)
for idx, t in enumerate(t_train):
pos = int((t - training_start) % seasonal_period)
seasonal_effect[pos] += p_detrended[idx]
count[pos] += 1
# Avoid division by zero and compute average
seasonal_effect = np.where(count > 0, seasonal_effect / count, 0)
# Build the seasonal component for each training time
seasonal_train = np.array([seasonal_effect[int((t - training_start) % seasonal_period)]
for t in t_train])
# Remove seasonal effect from the detrended data
p_deseasonalized = p_detrended - seasonal_train
else:
p_deseasonalized = p_detrended
seasonal_train = np.zeros_like(p_train) # no seasonal component
# 5. Fit Decision Tree Regressor on the deseasonalized (residual) training data.
dt_reg = DecisionTreeRegressor(max_depth=max_depth)
dt_reg.fit(t_train.reshape(-1, 1), p_deseasonalized)
# In-sample prediction on training data (residual prediction)
pred_deseasonalized_train = dt_reg.predict(t_train.reshape(-1, 1))
# Reconstruct training predictions by adding back the trend and seasonal component (if any)
pred_train_reconstructed = pred_deseasonalized_train + trend_train
if seasonal_period is not None and seasonal_period > 1:
pred_train_reconstructed += seasonal_train
# 6. Forecasting for times after training_end
forecast_times = np.arange(training_end + 1, forecast_time + 1)
# Predict the residuals for forecast times
pred_deseasonalized_forecast = dt_reg.predict(forecast_times.reshape(-1, 1))
# Evaluate trend for forecast times
trend_forecast = np.polyval(coeffs, forecast_times)
# If seasonal, compute seasonal component for forecast times
if seasonal_period is not None and seasonal_period > 1:
seasonal_forecast = np.array([seasonal_effect[int((t - training_start) % seasonal_period)]
for t in forecast_times])
else:
seasonal_forecast = np.zeros_like(forecast_times)
# Reconstruct forecasted predictions
forecast_pred = pred_deseasonalized_forecast + trend_forecast + seasonal_forecast
# 7. Build full reconstruction of predictions (training + forecast)
full_times = np.concatenate([t_train, forecast_times])
full_predictions = np.concatenate([pred_train_reconstructed, forecast_pred])
# Determine price at training_end (last training prediction) and at forecast_time (last forecast value)
price_at_training = pred_train_reconstructed[-1]
price_at_forecast = forecast_pred[-1]
predicted_return = (price_at_forecast - price_at_training) / price_at_training
dt_pred[name] = {
"model_predictions": (t_train, pred_train_reconstructed),
"full_reconstruction": (full_times, full_predictions),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return dt_pred
# Example usage:
dt_results = decision_tree_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
seasonal_period=50, max_depth=50)
plot_asset_forecasts(asset_names, asset_times, asset_prices, dt_results, training_offset=100, color="orange")
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, SimpleRNN
def create_dataset(data, look_back):
"""
Create sliding-window sequences for training.
Args:
data: 1D numpy array of normalized asset prices.
look_back: number of time steps to use as input.
Returns:
X: numpy array of shape (n_samples, look_back)
y: numpy array of shape (n_samples,)
"""
X, y = [], []
for i in range(len(data) - look_back):
X.append(data[i:i+look_back])
y.append(data[i+look_back])
return np.array(X), np.array(y)
def rnn_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
look_back=10, epochs=20, batch_size=64):
"""
For each asset, this function:
1. Selects training data in [training_start, training_end].
2. Computes the min and max of the training data and normalizes it.
3. Creates sliding-window sequences and trains an RNN model on the normalized data.
4. Makes in-sample predictions and recursively forecasts until forecast_time (all in normalized space).
5. Transforms the predictions back to the original scale using the stored min and max.
6. Clipping: Any negative values in the predictions (both in-sample and forecast) are set to 0.
"""
rnn_pred = {}
for i, name in enumerate(asset_names):
times = np.array(asset_times[i])
prices = np.array(asset_prices[i])
# Select training data using the provided time window.
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
p_train = prices[mask]
if len(p_train) <= look_back:
raise ValueError(f"Not enough training data for asset {name} with look_back = {look_back}.")
# Get the min and max from the training data (original scale).
train_min = p_train.min()
train_max = p_train.max()
# Normalize the training data.
p_train_norm = (p_train - train_min) / (train_max - train_min)
# Create sliding-window sequences on the normalized data.
X_train, y_train = create_dataset(p_train_norm, look_back)
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
# Build the RNN model using the specified architecture.
model = Sequential()
model.add(SimpleRNN(units=50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
model.add(SimpleRNN(units=50, return_sequences=False))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the RNN model on normalized data (verbose=0 to suppress output)
model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0)
# In-sample predictions on the normalized training data.
pred_train_norm = model.predict(X_train, verbose=0).flatten()
# Inverse transform the normalized predictions to the original scale.
pred_train = pred_train_norm * (train_max - train_min) + train_min
# Clip any negative predictions to 0.
pred_train = np.maximum(pred_train, 0)
# The in-sample predictions correspond to times t_train[look_back:].
t_train_pred = t_train[look_back:]
# Determine forecast steps (assumes integer time steps).
forecast_steps = int(forecast_time - training_end)
if forecast_steps <= 0:
raise ValueError("forecast_time must be greater than training_end")
# Recursive forecasting: use the last look_back normalized values to predict the next value.
last_sequence = p_train_norm[-look_back:].tolist()
forecasted_norm = []
for _ in range(forecast_steps):
input_seq = np.array(last_sequence[-look_back:]).reshape((1, look_back, 1))
next_val_norm = model.predict(input_seq, verbose=0)[0, 0]
forecasted_norm.append(next_val_norm)
last_sequence.append(next_val_norm)
# Inverse-transform the forecasted normalized predictions.
forecasted = np.array(forecasted_norm) * (train_max - train_min) + train_min
# Clip negative forecast values.
forecasted = np.maximum(forecasted, 0)
# Construct full reconstruction: combine in-sample predictions and forecasted values.
forecast_times = np.arange(training_end + 1, forecast_time + 1)
full_times = np.concatenate([t_train_pred, forecast_times])
full_predictions = np.concatenate([pred_train, forecasted])
full_predictions = np.maximum(full_predictions, 0)
price_at_training = pred_train[-1]
price_at_forecast = forecasted[-1]
predicted_return = (price_at_forecast - price_at_training) / price_at_training
rnn_pred[name] = {
"model_predictions": (t_train_pred, pred_train),
"full_reconstruction": (full_times, full_predictions),
"price_at_training": price_at_training,
"price_at_forecast": price_at_forecast,
"predicted_return": predicted_return
}
return rnn_pred
# (Assuming asset_names, asset_times, asset_prices, and plot_asset_forecasts are defined)
rnn_results = rnn_forecast(asset_names, asset_times, asset_prices,
training_start=0, training_end=100, forecast_time=200,
look_back=30, epochs=20, batch_size=64)
plot_asset_forecasts(asset_names, asset_times, asset_prices, rnn_results, training_offset=100, color="brown")
/Users/dromaniv/multiple-objective-optimization/.venv/lib/python3.12/site-packages/keras/src/layers/rnn/rnn.py:200: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(**kwargs)
import numpy as np
import matplotlib.pyplot as plt
def aggregate_and_plot_mean_median(asset_names, asset_times, asset_prices,
rnn_results, dt_results, naive_results, exp_results,
arima_results, sparse_results, linear_results,
training_start=0, training_end=100, forecast_time=200):
"""
For each asset, this function:
- Prints a table of predicted training price, forecast price, and expected return (in %)
for each forecasting method.
- Computes the pointwise median and mean forecasts across methods.
- Displays two separate 5×4 grids: one for the median forecast and one for the mean forecast.
Expected return is computed as (forecast / training_price)*100%.
Returns a dictionary with two keys: "median" and "mean". Each value is a dictionary where each asset
is represented with keys:
- "model_predictions": (training_times, aggregated in-sample predictions) – here we fill the training period with the aggregated training price.
- "full_reconstruction": (common_times, aggregated full forecast)
- "price_at_training": aggregated training price (scalar)
- "price_at_forecast": aggregated forecast price (scalar)
- "predicted_return": aggregated expected return (scalar)
"""
# Combine forecast dictionaries.
methods = {
'RNN': rnn_results,
'DecisionTree': dt_results,
'Naive': naive_results,
'ExpSmoothing': exp_results,
'ARIMA': arima_results,
'Sparse': sparse_results,
'Linear': linear_results
}
# Create a common time axis.
common_times = np.linspace(training_start, forecast_time, forecast_time - training_start)
# Containers for aggregated results.
median_results = {}
mean_results = {}
# Loop over assets.
for idx, asset in enumerate(asset_names):
print(f"\n{'='*60}\nAsset: {asset}")
# Get asset training data.
times = np.array(asset_times[idx])
prices = np.array(asset_prices[idx])
mask = (times >= training_start) & (times <= training_end)
t_train = times[mask]
real_training_price = prices[mask][-1]
print(f"Real training price (last value in training data): {real_training_price:.4f}")
# Initialize lists to collect scalar predictions and aligned full forecasts.
training_preds = [] # from each method: price_at_training
forecast_preds = [] # from each method: price_at_forecast
exp_returns = [] # expected return in percentage
full_predictions_aligned = [] # each element is an array aligned on common_times
# Print table header.
header = f"{'Method':<15} {'TrainPred':>10} {'Forecast':>10} {'Return (%)':>12}"
print(header)
print("-" * len(header))
for method_name, result in methods.items():
if asset not in result:
continue
res = result[asset]
train_pred = res["price_at_training"]
forecast_pred = res["price_at_forecast"]
# Expected return = (forecast / train_pred)*100%
exp_return = (forecast_pred / train_pred) * 100.0
print(f"{method_name:<15} {train_pred:10.4f} {forecast_pred:10.4f} {exp_return:12.2f}")
training_preds.append(train_pred)
forecast_preds.append(forecast_pred)
exp_returns.append(exp_return)
# Interpolate full reconstruction on common_times.
method_times, method_preds = res["full_reconstruction"]
aligned = np.interp(common_times, method_times, method_preds)
full_predictions_aligned.append(aligned)
# Compute aggregated scalar values.
median_train = np.median(training_preds)
median_forecast = np.median(forecast_preds)
median_return = np.median(exp_returns)
mean_train = np.mean(training_preds)
mean_forecast = np.mean(forecast_preds)
mean_return = np.mean(exp_returns)
print("-" * len(header))
print(f"{'Median':<15} {median_train:10.4f} {median_forecast:10.4f} {median_return:12.2f}")
print(f"{'Mean':<15} {mean_train:10.4f} {mean_forecast:10.4f} {mean_return:12.2f}")
# Compute pointwise aggregated forecasts.
full_predictions_aligned = np.array(full_predictions_aligned)
median_full = np.median(full_predictions_aligned, axis=0)
mean_full = np.mean(full_predictions_aligned, axis=0)
# Save aggregated results for this asset.
# For in-sample predictions, we simply fill the training period with the aggregated training price.
median_results[asset] = {
"model_predictions": (t_train, np.full_like(t_train, median_train)),
"full_reconstruction": (common_times, median_full),
"price_at_training": median_train,
"price_at_forecast": median_forecast,
"predicted_return": median_return
}
mean_results[asset] = {
"model_predictions": (t_train, np.full_like(t_train, mean_train)),
"full_reconstruction": (common_times, mean_full),
"price_at_training": mean_train,
"price_at_forecast": mean_forecast,
"predicted_return": mean_return
}
# Plotting: Two separate 5x4 charts.
n_assets = len(asset_names)
n_rows, n_cols = 5, 4
# Figure for Median forecasts.
fig_med, axes_med = plt.subplots(n_rows, n_cols, figsize=(20, 25), sharex=True, sharey=False)
axes_med = axes_med.flatten()
for idx, asset in enumerate(asset_names):
ax = axes_med[idx]
# Plot aggregated median forecast.
ax.plot(common_times, median_results[asset]["full_reconstruction"][1], label="Median Forecast", color='black', linewidth=2)
# Plot each method's forecast.
for method_name, result in methods.items():
if asset in result:
m_times, m_preds = result[asset]["full_reconstruction"]
aligned = np.interp(common_times, m_times, m_preds)
ax.plot(common_times, aligned, label=method_name, linestyle="--", alpha=0.6)
# Plot real training data.
mask = (np.array(asset_times[asset_names.index(asset)]) >= training_start) & (np.array(asset_times[asset_names.index(asset)]) <= training_end)
t_train = np.array(asset_times[asset_names.index(asset)])[mask]
ax.plot(t_train, np.array(asset_prices[asset_names.index(asset)])[mask], label="Real Training", color="blue", linewidth=2)
ax.set_title(asset)
ax.set_xlabel("Time")
ax.set_ylabel("Price")
ax.legend(fontsize=8)
# Hide any unused subplots.
for j in range(idx+1, n_rows*n_cols):
axes_med[j].axis('off')
fig_med.tight_layout()
# Figure for Mean forecasts.
fig_mean, axes_mean = plt.subplots(n_rows, n_cols, figsize=(20, 25), sharex=True, sharey=False)
axes_mean = axes_mean.flatten()
for idx, asset in enumerate(asset_names):
ax = axes_mean[idx]
# Plot aggregated mean forecast.
ax.plot(common_times, mean_results[asset]["full_reconstruction"][1], label="Mean Forecast", color='black', linewidth=2)
# Plot each method's forecast.
for method_name, result in methods.items():
if asset in result:
m_times, m_preds = result[asset]["full_reconstruction"]
aligned = np.interp(common_times, m_times, m_preds)
ax.plot(common_times, aligned, label=method_name, linestyle="--", alpha=0.6)
# Plot real training data.
mask = (np.array(asset_times[asset_names.index(asset)]) >= training_start) & (np.array(asset_times[asset_names.index(asset)]) <= training_end)
t_train = np.array(asset_times[asset_names.index(asset)])[mask]
ax.plot(t_train, np.array(asset_prices[asset_names.index(asset)])[mask], label="Real Training", color="blue", linewidth=2)
ax.set_title(asset)
ax.set_xlabel("Time")
ax.set_ylabel("Price")
ax.legend(fontsize=8)
for j in range(idx+1, n_rows*n_cols):
axes_mean[j].axis('off')
fig_mean.tight_layout()
plt.show()
# Return the aggregated results.
return {"median": median_results, "mean": mean_results}
results = aggregate_and_plot_mean_median(asset_names, asset_times, asset_prices,
rnn_results, dt_results, naive_results, exp_results,
arima_results, sparse_results, linear_results,
training_start=0, training_end=100, forecast_time=200)
============================================================ Asset: SafeAndCare Real training price (last value in training data): 6.3601 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 6.4191 6.3357 98.70 DecisionTree 6.3601 5.4799 86.16 Naive 6.3601 6.3601 100.00 ExpSmoothing 6.3384 5.0240 79.26 ARIMA 6.3513 6.0570 95.37 Sparse 6.2786 7.8440 124.93 Linear 6.4551 5.5750 86.36 -------------------------------------------------- Median 6.3601 6.0570 95.37 Mean 6.3661 6.0965 95.83 ============================================================ Asset: Moneymakers Real training price (last value in training data): 4.1005 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 4.4312 4.9323 111.31 DecisionTree 4.1005 6.0619 147.83 Naive 4.1005 4.1005 100.00 ExpSmoothing 4.0759 0.4886 11.99 ARIMA 4.1003 2.9480 71.90 Sparse 4.2734 10.0247 234.58 Linear 4.5796 6.5410 142.83 -------------------------------------------------- Median 4.1005 4.9323 111.31 Mean 4.2374 5.0138 117.21 ============================================================ Asset: Fuel4 Real training price (last value in training data): 6.0939 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 5.9858 4.0874 68.28 DecisionTree 6.0939 3.9304 64.50 Naive 6.0939 6.0939 100.00 ExpSmoothing 6.1702 4.5776 74.19 ARIMA 6.1637 5.4801 88.91 Sparse 6.1140 3.0142 49.30 Linear 5.6435 3.4800 61.66 -------------------------------------------------- Median 6.0939 4.0874 68.28 Mean 6.0378 4.3805 72.41 ============================================================ Asset: CPU-XYZ Real training price (last value in training data): 8.1245 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 8.1440 8.0537 98.89 DecisionTree 8.1245 7.9867 98.30 Naive 8.1245 8.1245 100.00 ExpSmoothing 8.3347 10.8390 130.05 ARIMA 8.3324 8.4699 101.65 Sparse 8.2151 13.5739 165.23 Linear 6.7088 6.5710 97.95 -------------------------------------------------- Median 8.1440 8.1245 100.00 Mean 7.9977 9.0884 113.15 ============================================================ Asset: MarsProject Real training price (last value in training data): 3.4077 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 3.7365 3.8235 102.33 DecisionTree 3.4077 3.7016 108.63 Naive 3.4077 3.4077 100.00 ExpSmoothing 3.6200 2.7465 75.87 ARIMA 3.6997 3.7121 100.34 Sparse 3.6921 7.7155 208.97 Linear 3.6912 3.9852 107.96 -------------------------------------------------- Median 3.6912 3.7121 102.33 Mean 3.6078 4.1560 114.87 ============================================================ Asset: WorldNow Real training price (last value in training data): 8.1922 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 8.2762 8.4196 101.73 DecisionTree 8.1922 11.5655 141.18 Naive 8.1922 8.1922 100.00 ExpSmoothing 8.2196 8.2684 100.59 ARIMA 8.2129 8.1328 99.02 Sparse 8.3721 17.1030 204.29 Linear 7.9307 11.3040 142.53 -------------------------------------------------- Median 8.2129 8.4196 101.73 Mean 8.1994 10.4265 127.05 ============================================================ Asset: Photons Real training price (last value in training data): 6.3065 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 6.1455 4.8314 78.62 DecisionTree 6.3065 8.1080 128.56 Naive 6.3065 6.3065 100.00 ExpSmoothing 6.3002 20.4726 324.95 ARIMA 6.2983 12.9393 205.44 Sparse 6.0467 0.6724 11.12 Linear 6.7308 8.5322 126.76 -------------------------------------------------- Median 6.3002 8.1080 126.76 Mean 6.3049 8.8375 139.35 ============================================================ Asset: EnviroLike Real training price (last value in training data): 6.3693 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 6.5870 7.2457 110.00 DecisionTree 6.3693 5.1785 81.30 Naive 6.3693 6.3693 100.00 ExpSmoothing 6.6534 3.6944 55.53 ARIMA 6.6833 6.5567 98.11 Sparse 6.7181 9.7550 145.20 Linear 6.3914 5.2006 81.37 -------------------------------------------------- Median 6.5870 6.3693 98.11 Mean 6.5388 6.2857 95.93 ============================================================ Asset: BetterTechnology Real training price (last value in training data): 5.4368 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 5.2397 5.5819 106.53 DecisionTree 5.4368 4.5465 83.63 Naive 5.4368 5.4368 100.00 ExpSmoothing 5.1834 1.5362 29.64 ARIMA 5.2241 4.4289 84.78 Sparse 5.4819 1.7524 31.97 Linear 5.7359 4.8456 84.48 -------------------------------------------------- Median 5.4368 4.5465 84.48 Mean 5.3912 4.0183 74.43 ============================================================ Asset: PearPear Real training price (last value in training data): 2.5805 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 2.5644 0.0000 0.00 DecisionTree 2.5805 3.7159 144.00 Naive 2.5805 2.5805 100.00 ExpSmoothing 2.5799 0.3529 13.68 ARIMA 2.6076 0.3666 14.06 Sparse 2.5296 0.0000 0.00 Linear 4.2897 5.4252 126.47 -------------------------------------------------- Median 2.5805 0.3666 14.06 Mean 2.8189 1.7773 56.89 ============================================================ Asset: PositiveCorrelation Real training price (last value in training data): 3.4522 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 3.5433 5.5273 155.99 DecisionTree 3.4522 0.3755 10.88 Naive 3.4522 3.4522 100.00 ExpSmoothing 3.4567 16.4869 476.95 ARIMA 3.4498 4.4155 127.99 Sparse 3.1575 0.0000 0.00 Linear 3.3221 0.2455 7.39 -------------------------------------------------- Median 3.4522 3.4522 100.00 Mean 3.4048 4.3575 125.60 ============================================================ Asset: WaterForce Real training price (last value in training data): 2.8332 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 2.8829 3.4260 118.84 DecisionTree 2.8332 4.3917 155.01 Naive 2.8332 2.8332 100.00 ExpSmoothing 2.8341 0.1194 4.21 ARIMA 2.8314 1.9700 69.58 Sparse 3.0083 0.0000 0.00 Linear 3.6786 5.2370 142.37 -------------------------------------------------- Median 2.8341 2.8332 100.00 Mean 2.9860 2.5682 84.29 ============================================================ Asset: Electronics123 Real training price (last value in training data): 7.0791 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 7.0480 8.4357 119.69 DecisionTree 7.0791 5.9075 83.45 Naive 7.0791 7.0791 100.00 ExpSmoothing 7.2113 3.8513 53.41 ARIMA 7.2006 7.1242 98.94 Sparse 7.3697 6.4980 88.17 Linear 6.9725 5.8009 83.20 -------------------------------------------------- Median 7.0791 6.4980 88.17 Mean 7.1372 6.3852 89.55 ============================================================ Asset: RoboticsX Real training price (last value in training data): 4.4676 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 4.4189 2.3852 53.98 DecisionTree 4.4676 6.4657 144.72 Naive 4.4676 4.4676 100.00 ExpSmoothing 4.4328 1.4151 31.92 ARIMA 4.5043 5.4782 121.62 Sparse 4.4876 0.0000 0.00 Linear 6.0182 8.0163 133.20 -------------------------------------------------- Median 4.4676 4.4676 100.00 Mean 4.6853 4.0326 83.64 ============================================================ Asset: Apples Real training price (last value in training data): 4.0061 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 4.1893 0.1958 4.67 DecisionTree 4.0061 -0.3381 -8.44 Naive 4.0061 4.0061 100.00 ExpSmoothing 4.1689 11.1636 267.78 ARIMA 4.2264 4.3909 103.89 Sparse 3.7707 0.0000 0.00 Linear 5.1556 0.8114 15.74 -------------------------------------------------- Median 4.1689 0.8114 15.74 Mean 4.2176 2.8900 69.09 ============================================================ Asset: BetterTomorrow Real training price (last value in training data): 5.5067 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 5.4898 4.5197 82.33 DecisionTree 5.5067 6.6108 120.05 Naive 5.5067 5.5067 100.00 ExpSmoothing 5.7371 5.4312 94.67 ARIMA 5.6879 5.4010 94.95 Sparse 5.8045 10.6659 183.75 Linear 5.6146 6.7186 119.66 -------------------------------------------------- Median 5.6146 5.5067 100.00 Mean 5.6211 6.4077 113.63 ============================================================ Asset: SpaceNow Real training price (last value in training data): 5.6439 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 5.7705 5.5827 96.75 DecisionTree 5.6439 5.8384 103.45 Naive 5.6439 5.6439 100.00 ExpSmoothing 5.7533 3.4518 60.00 ARIMA 5.7685 5.6312 97.62 Sparse 5.6534 4.4997 79.59 Linear 5.6289 5.8234 103.46 -------------------------------------------------- Median 5.6534 5.6312 97.62 Mean 5.6946 5.2102 91.55 ============================================================ Asset: Lasers Real training price (last value in training data): 6.1223 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 5.8596 6.4568 110.19 DecisionTree 6.1223 3.9896 65.17 Naive 6.1223 6.1223 100.00 ExpSmoothing 5.9820 3.7861 63.29 ARIMA 6.0733 6.0858 100.20 Sparse 5.9703 0.9304 15.58 Linear 6.1939 4.0612 65.57 -------------------------------------------------- Median 6.0733 4.0612 65.57 Mean 6.0462 4.4903 74.29 ============================================================ Asset: SuperFuture Real training price (last value in training data): 3.8392 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 4.0760 6.2465 153.25 DecisionTree 3.8392 6.1281 159.62 Naive 3.8392 3.8392 100.00 ExpSmoothing 3.6543 0.2507 6.86 ARIMA 3.6090 1.6457 45.60 Sparse 3.6170 0.0000 0.00 Linear 6.1167 8.4056 137.42 -------------------------------------------------- Median 3.8392 3.8392 100.00 Mean 4.1074 3.7880 86.11 ============================================================ Asset: ABCDE Real training price (last value in training data): 3.1961 Method TrainPred Forecast Return (%) -------------------------------------------------- RNN 3.3017 3.3669 101.98 DecisionTree 3.1961 5.1690 161.73 Naive 3.1961 3.1961 100.00 ExpSmoothing 3.1663 2.2913 72.37 ARIMA 3.1499 3.0396 96.50 Sparse 3.3762 8.1929 242.66 Linear 3.0490 5.0219 164.71 -------------------------------------------------- Median 3.1961 3.3669 101.98 Mean 3.2051 4.3254 134.28
Markowitz Model using mean results¶
# Extract expected returns from mean results
expected_returns = np.array([results["mean"][asset]["predicted_return"] / 100.0 for asset in asset_names])
n_assets = len(asset_names)
# Compute covariance matrix using historical data
historical_returns = []
for i, asset in enumerate(asset_names):
# Get prices during training period
prices = np.array(asset_prices[i])
times = np.array(asset_times[i])
mask = (times >= 0) & (times <= 100)
asset_prices_train = prices[mask]
# Calculate returns
asset_returns = np.diff(asset_prices_train) / asset_prices_train[:-1]
historical_returns.append(asset_returns)
# Ensure all historical returns have the same length
min_length = min(len(returns) for returns in historical_returns)
historical_returns = [returns[:min_length] for returns in historical_returns]
historical_returns = np.array(historical_returns)
# Compute covariance matrix
cov_matrix = np.cov(historical_returns)
# Ensure covariance matrix is positive semi-definite
eigenvalues = np.linalg.eigvalsh(cov_matrix)
if np.any(eigenvalues < 0):
cov_matrix += np.eye(n_assets) * 1e-8
print("Expected Returns (sample):")
for i in range(min(5, n_assets)):
print(f"{asset_names[i]}: {expected_returns[i]:.4f}")
print("\nCovariance Matrix (sample):")
print(cov_matrix[:3, :3])
# Find minimum risk portfolio
def solve_min_risk():
"""Find the portfolio with minimum risk"""
P = matrix(cov_matrix)
q = matrix(np.zeros(n_assets))
# Constraint: w ≥ 0
G = matrix(-np.eye(n_assets))
h = matrix(np.zeros(n_assets))
# Constraint: sum(w) = 1
A = matrix(np.ones((1, n_assets)))
b = matrix(np.ones(1))
sol = solvers.qp(P, q, G, h, A, b)
weights = np.array(sol['x']).flatten()
port_return = np.dot(weights, expected_returns)
port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
return weights, port_risk, port_return
# Find maximum return portfolio
def solve_max_return():
"""Find the portfolio with maximum return"""
c = matrix(-expected_returns) # Negative because we minimize -r^T w
# Constraint: w ≥ 0
G = matrix(-np.eye(n_assets))
h = matrix(np.zeros(n_assets))
# Constraint: sum(w) = 1
A = matrix(np.ones((1, n_assets)))
b = matrix(np.ones(1))
sol = solvers.lp(c, G, h, A, b)
weights = np.array(sol['x']).flatten()
port_return = np.dot(weights, expected_returns)
port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
return weights, port_return, port_risk
# Find the extreme portfolios
min_risk_weights, min_risk_value, min_risk_return = solve_min_risk()
max_return_weights, max_return_value, max_return_risk = solve_max_return()
print("\nMinimum Risk Portfolio:")
print(f"Risk: {min_risk_value:.6f}")
print(f"Return: {min_risk_return:.6f}")
print("\nMaximum Return Portfolio:")
print(f"Return: {max_return_value:.6f}")
print(f"Risk: {max_return_risk:.6f}")
Expected Returns (sample):
SafeAndCare: 0.9583
Moneymakers: 1.1721
Fuel4: 0.7241
CPU-XYZ: 1.1315
MarsProject: 1.1487
Covariance Matrix (sample):
[[ 1.01854023e-04 6.69455358e-05 -1.71112855e-05]
[ 6.69455358e-05 1.24950579e-03 1.45858732e-04]
[-1.71112855e-05 1.45858732e-04 2.97224717e-04]]
pcost dcost gap pres dres
0: 3.0411e-05 -1.0000e+00 1e+00 3e-16 5e+00
1: 3.0400e-05 -1.0040e-02 1e-02 2e-16 5e-02
2: 2.9347e-05 -1.3904e-04 2e-04 1e-16 8e-04
3: 1.5808e-05 -1.9400e-05 4e-05 4e-17 1e-04
4: 6.5780e-06 -3.3258e-06 1e-05 3e-16 1e-19
5: 5.2131e-06 3.7689e-06 1e-06 1e-16 3e-20
6: 4.8912e-06 4.7553e-06 1e-07 7e-17 2e-20
7: 4.8531e-06 4.8448e-06 8e-09 3e-17 2e-20
Optimal solution found.
pcost dcost gap pres dres k/t
0: -9.7956e-01 -9.7956e-01 1e+00 2e-16 1e+00 1e+00
1: -1.0425e+00 -1.0562e+00 4e-01 2e-16 3e-01 2e-01
2: -1.1368e+00 -1.1735e+00 4e-01 1e-16 3e-01 2e-01
3: -1.3797e+00 -1.3977e+00 5e-02 2e-16 3e-02 2e-03
4: -1.3929e+00 -1.3933e+00 9e-04 1e-16 6e-04 5e-05
5: -1.3935e+00 -1.3935e+00 9e-06 3e-16 6e-06 5e-07
6: -1.3935e+00 -1.3935e+00 9e-08 2e-16 6e-08 5e-09
Optimal solution found.
Minimum Risk Portfolio:
Risk: 0.003115
Return: 1.075245
Maximum Return Portfolio:
Return: 1.393517
Risk: 0.009110
Implementation of the two multi-objective methods:¶
- Weighted Sum Method (WSM)
- Epsilon-Constraint Method (ECM)
# Weighted Sum Method (WSM)
def weighted_sum_method(n_weights=20, normalize=True):
# Set up normalization ranges if needed
if normalize:
return_range = max_return_value - min_risk_return
risk_range = max_return_risk - min_risk_value
# Generate weight vectors
weight_vectors = []
for i in range(n_weights):
w_return = i / (n_weights - 1) # Weight for return
w_risk = 1 - w_return # Weight for risk
weight_vectors.append((w_return, w_risk))
# Solve for each weight vector
solutions = []
for w_return, w_risk in weight_vectors:
# Setup quadratic programming parameters
P = matrix(cov_matrix)
q = matrix(np.zeros(n_assets))
if normalize:
# For normalized objective function
q_mod = matrix(-w_return * expected_returns / return_range)
P_mod = matrix(w_risk * cov_matrix / risk_range)
else:
# For non-normalized objective function
q_mod = matrix(-w_return * expected_returns)
P_mod = matrix(w_risk * cov_matrix)
# Constraint: w ≥ 0
G = matrix(-np.eye(n_assets))
h = matrix(np.zeros(n_assets))
# Constraint: sum(w) = 1
A = matrix(np.ones((1, n_assets)))
b = matrix(np.ones(1))
# Solve QP
sol = solvers.qp(P_mod, q_mod, G, h, A, b)
if sol['status'] != 'optimal':
print(f"Warning: Optimization did not reach an optimal solution for weights {w_return}, {w_risk}. Status: {sol['status']}")
continue
weights = np.array(sol['x']).flatten()
port_return = np.dot(weights, expected_returns)
port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
solutions.append((weights, port_return, port_risk))
# Filter duplicate solutions
unique_solutions = []
for sol in solutions:
is_unique = True
for existing_sol in unique_solutions:
if (np.abs(sol[1] - existing_sol[1]) < 1e-6 and
np.abs(sol[2] - existing_sol[2]) < 1e-6):
is_unique = False
break
if is_unique:
unique_solutions.append(sol)
return unique_solutions
# Epsilon-Constraint Method (ECM)
def epsilon_constraint_method(n_thresholds=20):
"""
Implements the Epsilon-Constraint Method for portfolio optimization.
Args:
n_thresholds: Number of threshold values to use
Returns:
List of Pareto optimal solutions (weights, return, risk)
"""
# Generate threshold values
thresholds = []
for i in range(n_thresholds):
t = min_risk_return + (i / (n_thresholds - 1)) * (max_return_value - min_risk_return)
thresholds.append(t)
# Solve for each threshold
solutions = []
for threshold in thresholds:
# Setup quadratic programming parameters
P = matrix(cov_matrix)
q = matrix(np.zeros(n_assets))
# Constraint: w ≥ 0
G = matrix(-np.eye(n_assets))
h = matrix(np.zeros(n_assets))
# Constraints: sum(w) = 1 and r^T w >= threshold
A = matrix(np.vstack((
np.ones(n_assets), # sum(w) = 1
expected_returns # r^T w >= threshold
)))
b = matrix(np.array([1.0, threshold]))
# Solve QP
sol = solvers.qp(P, q, G, h, A, b)
if sol['status'] != 'optimal':
print(f"Warning: Optimization did not reach an optimal solution for threshold {threshold}. Status: {sol['status']}")
continue
weights = np.array(sol['x']).flatten()
port_return = np.dot(weights, expected_returns)
port_risk = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
solutions.append((weights, port_return, port_risk))
# Filter duplicate solutions
unique_solutions = []
for sol in solutions:
is_unique = True
for existing_sol in unique_solutions:
if (np.abs(sol[1] - existing_sol[1]) < 1e-6 and
np.abs(sol[2] - existing_sol[2]) < 1e-6):
is_unique = False
break
if is_unique:
unique_solutions.append(sol)
return unique_solutions
# Function to plot Pareto front
def plot_pareto_front(solutions, title):
risks = [sol[2] for sol in solutions]
returns = [sol[1] for sol in solutions]
plt.figure(figsize=(10, 6))
plt.scatter(risks, returns, c='red', marker='o')
plt.xlabel('Risk [σ²]')
plt.ylabel('Expected Return [100%]')
plt.title(title)
plt.grid(True)
plt.show()
# Test the methods with a small number of points
wsm_solutions = weighted_sum_method(10)
ecm_solutions = epsilon_constraint_method(10)
print(f"WSM found {len(wsm_solutions)} unique solutions")
print(f"ECM found {len(ecm_solutions)} unique solutions")
plot_pareto_front(wsm_solutions, "WSM Pareto Front")
plot_pareto_front(ecm_solutions, "ECM Pareto Front")
pcost dcost gap pres dres
0: 3.3939e-03 -1.0046e+00 1e+00 3e-17 5e+00
1: 3.2985e-03 -1.4517e-02 2e-02 4e-16 8e-02
2: 1.9096e-03 -1.7903e-03 4e-03 3e-17 1e-02
3: 9.9268e-04 1.7865e-05 1e-03 4e-17 1e-17
4: 8.4275e-04 7.0691e-04 1e-04 1e-16 3e-18
5: 8.1293e-04 8.0270e-04 1e-05 2e-16 2e-18
6: 8.0927e-04 8.0864e-04 6e-07 3e-16 3e-18
7: 8.0896e-04 8.0893e-04 4e-08 2e-16 2e-18
Optimal solution found.
pcost dcost gap pres dres
0: -4.4787e-01 -1.4932e+00 3e+01 5e+00 3e+00
1: -3.5008e-01 -1.3909e+00 1e+00 3e-15 8e-16
2: -3.6046e-01 -5.0045e-01 1e-01 6e-16 6e-16
3: -4.2535e-01 -5.4837e-01 1e-01 6e-16 5e-16
4: -4.7037e-01 -4.8143e-01 1e-02 2e-16 3e-16
5: -4.8011e-01 -4.8037e-01 3e-04 1e-16 3e-16
6: -4.8032e-01 -4.8034e-01 1e-05 1e-16 3e-16
7: -4.8033e-01 -4.8033e-01 5e-07 1e-16 3e-16
8: -4.8033e-01 -4.8033e-01 5e-09 3e-17 3e-16
Optimal solution found.
pcost dcost gap pres dres
0: -1.1258e+00 -1.9928e+00 3e+01 5e+00 2e+00
1: -7.2656e-01 -1.8800e+00 1e+00 2e-15 5e-16
2: -7.6011e-01 -9.8766e-01 2e-01 3e-16 3e-16
3: -8.6873e-01 -1.0636e+00 2e-01 1e-16 3e-16
4: -9.5718e-01 -9.6882e-01 1e-02 1e-16 3e-16
5: -9.6736e-01 -9.6761e-01 2e-04 7e-17 3e-16
6: -9.6759e-01 -9.6759e-01 4e-06 2e-16 4e-16
7: -9.6759e-01 -9.6759e-01 4e-08 2e-16 3e-16
Optimal solution found.
pcost dcost gap pres dres
0: -2.0424e+00 -2.5000e+00 4e+01 6e+00 1e+00
1: -1.1189e+00 -2.3710e+00 1e+00 5e-15 3e-16
2: -1.1818e+00 -1.4751e+00 3e-01 1e-15 3e-16
3: -1.3355e+00 -1.5681e+00 2e-01 6e-16 2e-16
4: -1.4462e+00 -1.4561e+00 1e-02 4e-16 4e-16
5: -1.4547e+00 -1.4549e+00 2e-04 1e-16 3e-16
6: -1.4548e+00 -1.4548e+00 2e-06 1e-16 5e-16
7: -1.4548e+00 -1.4548e+00 2e-08 1e-16 4e-16
Optimal solution found.
pcost dcost gap pres dres
0: -3.2117e+00 -3.0112e+00 5e+01 7e+00 1e+00
1: -1.5228e+00 -2.8628e+00 1e+00 1e-14 4e-16
2: -1.6180e+00 -1.9624e+00 3e-01 2e-15 4e-16
3: -1.8126e+00 -2.0611e+00 2e-01 2e-15 3e-16
4: -1.9356e+00 -1.9433e+00 8e-03 1e-16 4e-16
5: -1.9420e+00 -1.9421e+00 1e-04 2e-16 3e-16
6: -1.9421e+00 -1.9421e+00 1e-06 2e-16 3e-16
Optimal solution found.
pcost dcost gap pres dres
0: -4.6510e+00 -3.5220e+00 6e+01 7e+00 1e+00
1: -1.9342e+00 -3.3543e+00 1e+00 1e-14 4e-16
2: -2.0642e+00 -2.4490e+00 4e-01 2e-15 3e-16
3: -2.2978e+00 -2.5473e+00 2e-01 7e-16 3e-16
4: -2.4252e+00 -2.4305e+00 5e-03 8e-17 2e-16
5: -2.4293e+00 -2.4294e+00 5e-05 2e-16 3e-16
6: -2.4294e+00 -2.4294e+00 5e-07 8e-17 2e-16
Optimal solution found.
pcost dcost gap pres dres
0: -6.3819e+00 -4.0259e+00 6e+01 8e+00 9e-01
1: -2.3490e+00 -3.8448e+00 1e+00 3e-15 4e-16
2: -2.5169e+00 -2.9347e+00 4e-01 7e-16 2e-16
3: -2.7791e+00 -3.0252e+00 2e-01 4e-16 3e-16
4: -2.9142e+00 -2.9177e+00 3e-03 9e-17 2e-16
5: -2.9166e+00 -2.9166e+00 3e-05 2e-16 4e-16
6: -2.9166e+00 -2.9166e+00 3e-07 2e-16 3e-16
Optimal solution found.
pcost dcost gap pres dres
0: -8.4322e+00 -4.5139e+00 7e+01 9e+00 8e-01
1: -2.7620e+00 -4.3334e+00 2e+00 8e-15 5e-16
2: -2.9725e+00 -3.4192e+00 4e-01 2e-15 4e-16
3: -3.2451e+00 -3.4946e+00 2e-01 6e-16 2e-16
4: -3.4016e+00 -3.4055e+00 4e-03 1e-16 3e-16
5: -3.4039e+00 -3.4039e+00 4e-05 1e-16 3e-16
6: -3.4039e+00 -3.4039e+00 4e-07 3e-16 3e-16
Optimal solution found.
pcost dcost gap pres dres
0: -1.0839e+01 -4.9720e+00 8e+01 9e+00 8e-01
1: -3.1661e+00 -4.8191e+00 2e+00 5e-15 3e-16
2: -3.4261e+00 -3.9030e+00 5e-01 1e-15 3e-16
3: -3.7155e+00 -3.9646e+00 2e-01 1e-16 3e-16
4: -3.8890e+00 -3.8931e+00 4e-03 6e-16 3e-16
5: -3.8911e+00 -3.8912e+00 4e-05 2e-16 3e-16
6: -3.8911e+00 -3.8911e+00 4e-07 2e-16 3e-16
Optimal solution found.
pcost dcost gap pres dres
0: -1.3656e+01 -5.3784e+00 9e+01 1e+01 7e-01
1: -3.5668e+00 -5.3001e+00 2e+00 1e-15 2e-16
2: -3.8804e+00 -4.3876e+00 5e-01 3e-16 1e-16
3: -4.2011e+00 -4.4379e+00 2e-01 7e-16 2e-16
4: -4.3762e+00 -4.3801e+00 4e-03 1e-16 2e-16
5: -4.3784e+00 -4.3784e+00 4e-05 1e-16 1e-16
6: -4.3784e+00 -4.3784e+00 4e-07 1e-16 1e-16
Optimal solution found.
pcost dcost gap pres dres
0: 3.9958e-05 -1.0285e+00 1e+00 5e-16 5e+00
1: 3.9931e-05 -1.0378e-02 1e-02 8e-17 5e-02
2: 3.7497e-05 -1.8738e-04 2e-04 2e-16 1e-03
3: 1.6559e-05 -1.8884e-05 4e-05 8e-17 1e-04
4: 6.6902e-06 -6.9764e-06 1e-05 6e-16 1e-19
5: 5.2713e-06 3.4324e-06 2e-06 5e-16 4e-20
6: 4.8990e-06 4.7192e-06 2e-07 2e-16 2e-20
7: 4.8534e-06 4.8422e-06 1e-08 3e-16 2e-20
Optimal solution found.
pcost dcost gap pres dres
0: 4.6012e-05 -1.0347e+00 2e+01 4e+00 5e+00
1: 4.6490e-05 -9.1391e-01 1e+00 2e-02 3e-02
2: 4.6685e-05 -2.4618e-02 2e-02 3e-04 4e-04
3: 4.3734e-05 -5.7975e-04 6e-04 9e-06 9e-06
4: 2.0525e-05 -2.9660e-05 5e-05 5e-07 5e-07
5: 7.8131e-06 -5.5252e-06 1e-05 3e-16 8e-20
6: 5.5852e-06 3.3276e-06 2e-06 4e-16 3e-20
7: 5.0648e-06 4.8160e-06 2e-07 1e-16 1e-20
8: 5.0099e-06 4.9969e-06 1e-08 1e-16 1e-20
Optimal solution found.
pcost dcost gap pres dres
0: 5.3430e-05 -1.0385e+00 2e+01 5e+00 5e+00
1: 5.4307e-05 -9.0463e-01 1e+00 7e-02 8e-02
2: 5.5838e-05 -1.0454e-01 1e-01 3e-03 3e-03
3: 5.2938e-05 -2.0086e-03 2e-03 6e-05 6e-05
4: 2.7134e-05 -5.0912e-05 8e-05 2e-06 2e-06
5: 1.2139e-05 -7.7776e-06 2e-05 2e-07 3e-07
6: 6.7709e-06 9.2319e-07 6e-06 3e-16 9e-20
7: 5.6861e-06 4.9504e-06 7e-07 3e-16 3e-20
8: 5.4972e-06 5.4298e-06 7e-08 2e-16 2e-20
Optimal solution found.
pcost dcost gap pres dres
0: 6.2212e-05 -1.0401e+00 2e+01 5e+00 5e+00
1: 6.3713e-05 -8.8982e-01 1e+00 1e-01 1e-01
2: 6.8100e-05 -1.8173e-01 2e-01 1e-02 1e-02
3: 6.0802e-05 -2.0012e-03 2e-03 8e-06 9e-06
4: 3.0694e-05 -1.0967e-04 1e-04 5e-07 5e-07
5: 1.5144e-05 -9.2731e-06 2e-05 7e-08 7e-08
6: 8.0647e-06 -4.3611e-06 1e-05 1e-16 5e-20
7: 7.0592e-06 4.5317e-06 3e-06 3e-16 8e-20
8: 6.5308e-06 6.2706e-06 3e-07 3e-16 3e-20
9: 6.4462e-06 6.4338e-06 1e-08 1e-16 2e-20
Optimal solution found.
pcost dcost gap pres dres
0: 7.2358e-05 -1.0392e+00 2e+01 5e+00 5e+00
1: 7.4756e-05 -8.6953e-01 2e+00 2e-01 2e-01
2: 8.3760e-05 -2.2944e-01 3e-01 2e-02 2e-02
3: 6.3101e-05 -1.3104e-02 1e-02 8e-15 1e-15
4: 4.9061e-05 -4.4760e-04 5e-04 7e-15 8e-17
5: 2.0955e-05 -1.5012e-05 4e-05 1e-14 4e-18
6: 1.0546e-05 -3.1354e-06 1e-05 9e-17 1e-19
7: 8.9325e-06 6.7835e-06 2e-06 3e-16 6e-20
8: 8.2671e-06 7.8805e-06 4e-07 9e-17 3e-20
9: 8.1663e-06 8.1480e-06 2e-08 2e-16 2e-20
Optimal solution found.
pcost dcost gap pres dres
0: 8.3867e-05 -1.0361e+00 2e+01 5e+00 5e+00
1: 8.7483e-05 -8.4381e-01 2e+00 2e-01 2e-01
2: 1.0325e-04 -2.4711e-01 4e-01 3e-02 4e-02
3: 5.7063e-05 -4.0148e-02 4e-02 4e-14 1e-15
4: 5.2444e-05 -6.4466e-04 7e-04 1e-15 2e-16
5: 2.3869e-05 -1.5474e-05 4e-05 1e-14 9e-18
6: 1.4462e-05 4.4263e-06 1e-05 8e-16 1e-18
7: 1.1486e-05 8.3212e-06 3e-06 2e-16 8e-20
8: 1.0771e-05 1.0436e-05 3e-07 3e-16 5e-20
9: 1.0661e-05 1.0634e-05 3e-08 2e-16 3e-20
Optimal solution found.
pcost dcost gap pres dres
0: 9.6741e-05 -1.0306e+00 2e+01 5e+00 5e+00
1: 1.0194e-04 -8.1270e-01 2e+00 2e-01 3e-01
2: 1.2652e-04 -2.3065e-01 4e-01 5e-02 5e-02
3: 4.4807e-05 -7.6395e-02 8e-02 7e-15 2e-15
4: 4.3651e-05 -8.5127e-04 9e-04 2e-14 4e-16
5: 2.5779e-05 -1.7561e-05 4e-05 3e-15 2e-17
6: 1.7268e-05 9.5040e-06 8e-06 2e-15 7e-19
7: 1.4694e-05 1.0202e-05 4e-06 9e-16 6e-20
8: 1.4185e-05 1.3608e-05 6e-07 4e-17 8e-20
9: 1.4030e-05 1.4003e-05 3e-08 1e-16 4e-20
Optimal solution found.
pcost dcost gap pres dres
0: 1.1098e-04 -1.0228e+00 3e+01 5e+00 5e+00
1: 1.1818e-04 -7.7623e-01 2e+00 3e-01 3e-01
2: 1.5346e-04 -1.8294e-01 5e-01 7e-02 7e-02
3: 3.3675e-05 -1.0327e-01 1e-01 2e-14 2e-15
4: 3.3527e-05 -1.0438e-03 1e-03 2e-14 1e-15
5: 2.8218e-05 -1.3835e-05 4e-05 2e-14 4e-17
6: 2.0956e-05 1.2331e-05 9e-06 3e-15 4e-19
7: 1.9490e-05 1.7820e-05 2e-06 4e-16 1e-19
8: 1.8837e-05 1.8215e-05 6e-07 1e-16 5e-20
9: 1.8668e-05 1.8625e-05 4e-08 1e-16 5e-20
Optimal solution found.
pcost dcost gap pres dres
0: 1.2658e-04 -1.0126e+00 3e+01 5e+00 5e+00
1: 1.3624e-04 -7.3444e-01 2e+00 3e-01 4e-01
2: 1.8396e-04 -1.0739e-01 6e-01 9e-02 9e-02
3: 3.2773e-05 -9.4092e-02 9e-02 7e-15 2e-15
4: 3.2772e-05 -9.2647e-04 1e-03 6e-14 2e-15
5: 3.2671e-05 5.3837e-06 3e-05 3e-14 7e-17
6: 2.8246e-05 2.1093e-05 7e-06 2e-15 3e-18
7: 2.5713e-05 2.4041e-05 2e-06 2e-15 3e-19
8: 2.5178e-05 2.4739e-05 4e-07 9e-16 9e-20
9: 2.4925e-05 2.4875e-05 5e-08 1e-16 2e-19
Optimal solution found.
pcost dcost gap pres dres
0: 1.4354e-04 -1.0001e+00 3e+01 5e+00 5e+00
1: 1.5617e-04 -6.8735e-01 2e+00 4e-01 4e-01
2: 2.1782e-04 -7.0466e-03 6e-01 1e-01 1e-01
3: 4.0136e-05 -1.4842e-02 1e-01 1e-02 2e-02
4: 4.1481e-05 -1.0754e-04 1e-03 1e-04 2e-04
5: 4.1496e-05 3.9804e-05 1e-05 1e-06 2e-06
6: 4.1496e-05 4.1286e-05 3e-07 1e-08 2e-08
7: 4.1496e-05 4.1474e-05 2e-08 2e-11 2e-11
Optimal solution found.
WSM found 2 unique solutions
ECM found 10 unique solutions
Experimentation:¶
- Run WSM and ECM with different numbers of sample points.
- Plot the resulting Pareto fronts.
- Compare how many unique solutions each approach yields.
# Run WSM and ECM with different numbers of weight vectors/thresholds
point_counts = [10, 20, 50, 100, 200]
wsm_results = {}
ecm_results = {}
# Define different markers for each point count
markers = {
10: 'o', # circle
20: 's', # square
50: '^', # triangle up
100: 'd', # diamond
200: 'p' # pentagon
}
# Colors for each point count
colors = {
10: '#1f77b4', # blue
20: '#ff7f0e', # orange
50: '#2ca02c', # green
100: '#d62728', # red
200: '#9467bd' # purple
}
# Suppress solver output
solvers.options['show_progress'] = False
for count in point_counts:
print(f"Running with {count} points...")
# Run WSM with normalization
wsm_solutions = weighted_sum_method(count, normalize=True)
wsm_results[count] = {
'solutions': wsm_solutions,
'unique_count': len(wsm_solutions)
}
# Run ECM
ecm_solutions = epsilon_constraint_method(count)
ecm_results[count] = {
'solutions': ecm_solutions,
'unique_count': len(ecm_solutions)
}
print(f" WSM found {len(wsm_solutions)} unique solutions")
print(f" ECM found {len(ecm_solutions)} unique solutions")
# Bar chart for number of unique solutions
plt.figure(figsize=(12, 6))
x = np.arange(len(point_counts))
width = 0.35
wsm_counts = [wsm_results[c]['unique_count'] for c in point_counts]
ecm_counts = [ecm_results[c]['unique_count'] for c in point_counts]
plt.bar(x - width/2, wsm_counts, width, label='WSM', color='royalblue')
plt.bar(x + width/2, ecm_counts, width, label='ECM', color='tomato')
plt.xlabel('Number of Points', fontsize=12)
plt.ylabel('Number of Unique Solutions', fontsize=12)
plt.title('Comparison of Methods: Unique Solutions Found', fontsize=14)
plt.xticks(x, point_counts, fontsize=11)
plt.yticks(fontsize=11)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.show()
# Create separate plots for each point count
for count in point_counts:
plt.figure(figsize=(10, 5))
wsm_solutions = wsm_results[count]['solutions']
ecm_solutions = ecm_results[count]['solutions']
wsm_risks = [sol[2] for sol in wsm_solutions]
wsm_returns = [sol[1] for sol in wsm_solutions]
ecm_risks = [sol[2] for sol in ecm_solutions]
ecm_returns = [sol[1] for sol in ecm_solutions]
plt.scatter(wsm_risks, wsm_returns, marker='o', s=100, label='WSM', color='royalblue', alpha=0.8, edgecolors='navy')
plt.scatter(ecm_risks, ecm_returns, marker='x', s=100, label='ECM', color='tomato', alpha=0.8)
# Add connecting lines to show the fronts more clearly
wsm_points = sorted(zip(wsm_risks, wsm_returns), key=lambda x: x[0])
ecm_points = sorted(zip(ecm_risks, ecm_returns), key=lambda x: x[0])
if wsm_points:
wsm_x, wsm_y = zip(*wsm_points)
plt.plot(wsm_x, wsm_y, '-', color='royalblue', alpha=0.5)
if ecm_points:
ecm_x, ecm_y = zip(*ecm_points)
plt.plot(ecm_x, ecm_y, '-', color='tomato', alpha=0.5)
plt.xlabel('Risk [σ²]', fontsize=12)
plt.ylabel('Expected Return [100%]', fontsize=12)
plt.title(f'Pareto Front Comparison with {count} Points', fontsize=14)
plt.grid(True, alpha=0.3)
plt.legend(fontsize=12, loc='best')
plt.tight_layout()
plt.show()
# Combined plot with WSM only (to see its progression better)
plt.figure(figsize=(12, 7))
# Use fewer points for clarity - just use 10, 50, and 100 points
selected_counts = [10, 50, 100]
line_styles = {10: '-', 50: '--', 100: '-.'}
for count in selected_counts:
solutions = wsm_results[count]['solutions']
risks = [sol[2] for sol in solutions]
returns = [sol[1] for sol in solutions]
# Connect points for better visualization
points = sorted(zip(risks, returns), key=lambda x: x[0])
if points:
x_vals, y_vals = zip(*points)
# Use larger, more distinct markers and plot fewer of them (every nth point)
n = max(1, len(x_vals) // 20)
plt.scatter(x_vals[::n], y_vals[::n], s=150, label=f'WSM ({count} points)',
color=colors[count], marker=markers[count], alpha=0.9, edgecolors='black', linewidth=1.5)
# Use different line styles for different point counts
plt.plot(x_vals, y_vals, line_styles[count], color=colors[count],
alpha=0.7, linewidth=2.5)
plt.xlabel('Risk [σ²]', fontsize=14)
plt.ylabel('Expected Return [100%]', fontsize=14)
plt.title('WSM Pareto Fronts Comparison', fontsize=16)
plt.grid(True, alpha=0.3)
plt.legend(fontsize=13, loc='lower right')
plt.tight_layout()
plt.show()
# Combined plot with ECM only (to see its progression better)
plt.figure(figsize=(12, 7))
for count in selected_counts:
solutions = ecm_results[count]['solutions']
risks = [sol[2] for sol in solutions]
returns = [sol[1] for sol in solutions]
# Connect points for better visualization
points = sorted(zip(risks, returns), key=lambda x: x[0])
if points:
x_vals, y_vals = zip(*points)
# Use larger, more distinct markers and plot fewer of them
n = max(1, len(x_vals) // 20)
plt.scatter(x_vals[::n], y_vals[::n], s=150, label=f'ECM ({count} points)',
color=colors[count], marker=markers[count], alpha=0.9, edgecolors='black', linewidth=1.5)
# Use different line styles for different point counts
plt.plot(x_vals, y_vals, line_styles[count], color=colors[count],
alpha=0.7, linewidth=2.5)
plt.xlabel('Risk [σ²]', fontsize=12)
plt.ylabel('Expected Return [100%]', fontsize=12)
plt.title('ECM Pareto Fronts Comparison', fontsize=14)
plt.grid(True, alpha=0.3)
plt.legend(fontsize=12)
plt.tight_layout()
plt.show()
Running with 10 points... WSM found 2 unique solutions ECM found 10 unique solutions Running with 20 points... WSM found 4 unique solutions ECM found 20 unique solutions Running with 50 points... WSM found 7 unique solutions ECM found 50 unique solutions Running with 100 points... WSM found 11 unique solutions ECM found 100 unique solutions Running with 200 points... WSM found 19 unique solutions ECM found 200 unique solutions
Conclusions¶
The Epsilon-Constraint Method (ECM) generally finds more unique Pareto optimal solutions compared to the Weighted Sum Method (WSM), even when WSM uses normalization. This confirms the theoretical advantages of ECM, particularly its ability to find solutions in concave regions of the Pareto front, which WSM may miss.
Key observations from the visualizations:
- ECM produces a much more complete and well-distributed Pareto front
- WSM tends to cluster solutions at the extremes, with gaps in between
- As the number of points increases, ECM continues to find new unique solutions along the entire Pareto front while WSM finds fewer new solutions
- Even with 200 points, WSM still cannot adequately capture the concave regions
Portfolio Game¶
# Define the asset names in the correct order
asset_names_ordered = [
"SuperFuture", "Apples", "WorldNow", "Electronics123", "Photons",
"SpaceNow", "PearPear", "PositiveCorrelation", "BetterTechnology", "ABCDE",
"EnviroLike", "Moneymakers", "Fuel4", "MarsProject", "CPU-XYZ",
"RoboticsX", "Lasers", "WaterForce", "SafeAndCare", "BetterTomorrow"
]
# Combine solutions from both methods for a more comprehensive Pareto front
wsm_solutions = wsm_results[200]['solutions']
ecm_solutions = ecm_results[200]['solutions']
all_solutions = wsm_solutions + ecm_solutions
# Analyze the risk and return characteristics of all portfolios
risks = [sol[2] for sol in all_solutions]
returns = [sol[1] for sol in all_solutions]
# Calculate return threshold (median return)
return_50th = np.percentile(returns, 50)
print(f"Return threshold (median): {return_50th:.6f}")
# Find MinRisk portfolio with above-median return
selected_idx = -1
min_risk_value = float('inf')
for i, (weights, port_return, port_risk) in enumerate(all_solutions):
if port_return > return_50th and port_risk < min_risk_value:
min_risk_value = port_risk
selected_idx = i
# Get the selected portfolio
selected_weights, selected_return, selected_risk = all_solutions[selected_idx]
selected_source = "WSM" if selected_idx < len(wsm_solutions) else "ECM"
selected_sharpe = selected_return / selected_risk if selected_risk > 0 else 0
print(f"\nSelected MinRisk Portfolio (from {selected_source}):")
print(f"Expected Return: {selected_return:.6f}")
print(f"Expected Risk: {selected_risk:.6f}")
print(f"Sharpe Ratio: {selected_sharpe:.6f}")
print(f"Strategy: Minimum risk with above-median return")
# Map weights to the correct order
portfolio_weights = {}
for i, weight in enumerate(selected_weights):
portfolio_weights[asset_names[i]] = weight
# Create ordered weights list
ordered_weights = []
for asset in asset_names_ordered:
if asset in portfolio_weights:
ordered_weights.append(portfolio_weights[asset])
else:
ordered_weights.append(0.0)
# Assert weights sum to 1
sum_weights = sum(ordered_weights)
assert abs(sum_weights - 1.0) < 1e-10, f"Portfolio weights must sum to 1.0. Current sum: {sum_weights}"
print(f"Sum of weights: {sum(ordered_weights)}")
# Plot the Pareto fronts with the selected portfolio
plt.figure(figsize=(10, 6))
# Plot each method's front
plt.scatter([sol[2] for sol in wsm_solutions], [sol[1] for sol in wsm_solutions],
c='blue', s=30, alpha=0.6, label='WSM')
plt.scatter([sol[2] for sol in ecm_solutions], [sol[1] for sol in ecm_solutions],
c='green', s=30, alpha=0.6, label='ECM')
# Highlight the selected portfolio
plt.scatter([selected_risk], [selected_return], c='red', s=200, marker='*',
edgecolors='black', linewidth=1, label='MinRisk Portfolio')
# Add a horizontal line at the median return
plt.axhline(y=return_50th, color='grey', linestyle='--', alpha=0.7)
plt.text(min(risks), return_50th+0.01, 'Median Return', fontsize=10)
# Add annotation for the selected portfolio
plt.annotate(f"Return: {selected_return:.4f}\nRisk: {selected_risk:.4f}",
xy=(selected_risk, selected_return),
xytext=(selected_risk+0.0008, selected_return-0.05),
arrowprops=dict(facecolor='black', shrink=0.05),
bbox=dict(boxstyle="round", fc="white"))
plt.xlabel('Risk [σ²]')
plt.ylabel('Expected Return [100%]')
plt.title('MinRisk Portfolio Selection')
plt.grid(alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()
# Create a bar chart of the portfolio weights
plt.figure(figsize=(12, 6))
# Get non-zero weights and sort by weight
asset_weight_pairs = [(asset, weight) for asset, weight in zip(asset_names_ordered, ordered_weights)
if weight > 0.001]
asset_weight_pairs.sort(key=lambda x: x[1], reverse=True)
sorted_assets = [pair[0] for pair in asset_weight_pairs]
sorted_weights = [pair[1] for pair in asset_weight_pairs]
# Create bar chart
bars = plt.bar(sorted_assets, sorted_weights, color='skyblue', edgecolor='blue')
# Add weight values on bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height + 0.005,
f'{height:.2%}', ha='center', va='bottom')
plt.xlabel('Assets')
plt.ylabel('Weight')
plt.title('MinRisk Portfolio Asset Allocation')
plt.xticks(rotation=45, ha='right')
plt.grid(alpha=0.3, axis='y')
plt.tight_layout()
plt.show()
# Save the portfolio to a file
output_filename = "151958_151960.txt"
with open(output_filename, 'w') as f:
f.write(f"{selected_return:.6f} {selected_risk:.6f}")
for weight in ordered_weights:
f.write(f" {weight:.6f}")
print(f"Portfolio saved to {output_filename}")
Return threshold (median): 1.247976 Selected MinRisk Portfolio (from ECM): Expected Return: 1.249575 Expected Risk: 0.004573 Sharpe Ratio: 273.250791 Strategy: Minimum risk with above-median return Sum of weights: 1.0
Portfolio saved to 151958_151960.txt
Our Selected Portfolio¶
We chose a portfolio optimization strategy that balances risk minimization with reasonable returns.
- Expected Return: 1.2496 (24.96% expected growth)
- Risk Level: 0.0046 (very low volatility)
- Position on Pareto Front: We identified the crucial "elbow" point where the risk-return trade-off appears most favorable
- Method Used: We found the ECM (Epsilon-Constraint Method) produced better results for our specific criteria
Our Asset Allocation¶
Our analysis led us to a focused investment strategy:
- Photons (56.06%): We chose this as our primary investment due to its excellent risk-adjusted returns
- SafeAndCare (16.10%): We allocated a significant portion here for stability
- WorldNow (9.64%): We included this for additional growth potential
- Diversified holdings: We spread smaller investments across 9 additional assets to mitigate specific asset risks